A wireless communication intelligent anti-jamming decision algorithm based on Deep Reinforcement Learning (DRL) can gradually optimize communication anti-jamming strategies without prior knowledge by continuously interacting with the jamming environment. This has become one of the hoest research directions in the field of communication anti-jamming. In order to address the joint anti-jamming problem in scenarios with multiple users and without prior knowledge of jamming power, this paper proposes an intelligent anti-jamming decision algorithm for wireless communication based on Multi-Agent Proximal Policy Optimization (MAPPO). This algorithm combines centralized training and decentralized execution (CTDE), allowing each user to make independent decisions while fully leveraging the local information of all users during training. Specifically, the proposed algorithm shares all users’ perceptions, actions, and reward information during the learning phase to obtain a global state. Then, it calculates the value function and advantage function for each user based on this global state and optimizes each user’s independent policy. Each user can complete the anti-jamming decision based solely on local perception results and their independent policy. Meanwhile, MAPPO can handle continuous action spaces, allowing it to gradually approach the optimal value within the communication power range even without prior knowledge of jamming power. Simulation results show that the proposed algorithm exhibits significantly faster convergence speed and higher convergence values compared to Deep Q-Network (DQN), Q-Learning (QL), and random frequency hopping algorithms under frequency sweeping jamming and dynamic probabilistic jamming.
Loading....